Phone Duration Modeling for LVCSR Using Neural Networks
نویسندگان
چکیده
We describe our work on incorporating probabilities of phone durations, learned by a neural net, into an ASR system. Phone durations are incorporated via lattice rescoring. The input features are derived from the phone identities of a context window of phones, plus the durations of preceding phones within that window. Unlike some previous work, our network outputs the probability of different durations (in frames) directly, up to a fixed limit. We evaluate this method on several large vocabulary tasks, and while we consistently see improvements in Word Error Rates, the improvements are smaller when the lattices are generated with neural net based acoustic models.
منابع مشابه
شبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملGyroscope Random Drift Modeling, using Neural Networks, Fuzzy Neural and Traditional Time- series Methods
In this paper statistical and time series models are used for determining the random drift of a dynamically Tuned Gyroscope (DTG). This drift is compensated with optimal predictive transfer function. Also nonlinear neural-network and fuzzy-neural models are investigated for prediction and compensation of the random drift. Finally the different models are compared together and their advantages a...
متن کاملContext-dependent phone mapping for LVCSR of under-resourced languages
This paper presents a context-dependent phone mapping approach for acoustic modeling of large vocabulary speech recognition for under-resourced languages by leveraging on well trained models of other languages. Generally speaking, phone mapping can be considered as a hybrid HMM/MLP (Hidden Markov Model / Multilayer Perceptron) model where the input of the MLP is phone acoustic scores, e.g. like...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملModeling of Texture and Color Froth Characteristics for Evaluation of Flotation Performance in Sarcheshmeh Copper Pilot Plant, Using Image Analysis and Neural Networks
Texture and color appearance of froth is a discreet qualitative tool for evaluating the performance of flotation process. The structure of a froth developed on the flotation cell has a significant effect on the grade and recovery of copper concentrate. In this work, image analysis and neural networks have been implemented to model and control the performance of such a system. The result reveals...
متن کامل